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Abstract 


Observation-based  object  modeling  often  requires  integration  of  shape  descriptions  from 
different  views.  In  current  conventional  methods,  to  sequentially  merge  multiple  views, 
accurate  description  of  each  surface  patch  has  to  be  precisely  known  in  each  view  and  trans¬ 
formation  between  any  adjacent  views  needs  to  be  accurately  recovered.  When  noisy  data 
and  mismatches  are  present,  recovered  transformation  becomes  erroneous.  In  addition,  the 
transformation  error  accumulates  and  propagates  along  the  sequence,  which  results  in  an 
inaccurate  object  model.  To  overcome  these  problems,  we  have  developed  a  weighted  least 
square  (WLS)  approach  which  simultaneously  recovers  object  shape  and  transformation 
among  different  views  without  recovering  inter-frame  motion  as  an  intermediate  step. 

We  show  that  object  modeling  from  a  sequence  of  range  images  is  a  problem  of  principal 
component  analysis  with  missing  data  (PCAMD),  which  can  be  generalized  as  a  WLS  mini¬ 
mization  problem.  An  efficient  algorithm  is  devised  to  solve  the  problem  of  PCAMD.  After 
we  have  segmented  surface  regions  in  each  view  and  tracked  over  all  the  sequence,  we  con¬ 
struct  a  3  FxP  normal  measurement  matrix  of  surface  normals,  and  an  F  xP  distance  mea¬ 
surement  matrix  of  normal  distances  to  the  origin  for  all  visible  P  regions  appeared  over  the 
whole  sequence  of  F  views,  respectively.  These  two  measurement  matrices,  which  have 
many  missing  elements  due  to  noise,  occlusion  and  mismatching,  enable  us  to  formulate 
multiple  view  merging  as  a  combination  of  two  WLS  problems.  A  two-step  algorithm, 
which  employs  the  quaternion  representation  of  the  rotation  matrix,  is  presented  to  compute 
surface  descriptions  and  transformations  among  different  views  simultaneously.  After  sur¬ 
face  equations  are  extracted,  spatial  connectivity  among  these  surfaces  is  established  to 
enable  the  object  model  to  be  reconstructed. 

Experiments  using  synthetic  data  and  real  range  images  show  that  our  approach  is  robust 
against  noise  and  mismatching  and  generates  accurate  object  model  by  averaging  over  all 
visible  surfaces.  Specifically,  using  a  sequence  of  real  range  images,  we  illustrate  the  recon¬ 
struction  of  a  toy  house  model. 
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1  Introduction 

Solid  modeling  is  a  useful  tool  for  tasks  such  as  representing  the  virtual  environment  for 
virtual  reality  systems;  representing  the  real  environment  for  robot  programming;  and 
modeling  real  objects  for  object  recognition.  Currently,  most  object  models  are  con¬ 
structed  by  human  operators  [2].  It  would  be  much  better  to  have  a  system  that  can  auto¬ 
matically  build  models  of  real  objects  that  it  observes.  If  we  can  develop  a  reliable 
technique  to  generate  accurate  3D  object  models  by  observing  real  objects  from  multiple 
views,  we  can  reduce  the  effort  and  cost  of  model  construction,  and  we  can  significantly 
broaden  the  application  areas  of  solid  modeling. 

Observation-based  modeling  systems  usually  work  with  a  sequence  of  images  of  the 
object(s),  where  the  sequence  spans  a  smoothly  varying  change  in  the  positions  of  the  sen¬ 
sor  and/or  object(s).  Most  previous  systems  have  attempted  to  apply  inter-frame  motion 
estimates  to  successive  pairs  of  views  in  a  sequential  manner  [12].  Whenever  a  new  view 
is  introduced,  it  is  matched  with  the  previous  view,  and  the  transformation  between  these 
two  successive  views  has  to  be  recovered  before  the  object  model  is  updated.  This  sequen¬ 
tial  method  does  not  work  well  in  practice  because  local  motion  estimates  are  subject  to 
noise  and  missing  data.  Local  mismatching  errors  accumulate  and  propagate  along  the 
sequence,  yielding  erroneous  object  models. 

Rather  than  sequentially  integrating  successive  pairs  of  view,  we  should  instead  search  for 
the  statistically  optimal  object  model  that  is  most  consistent  with  all  the  views.  Although 
every  single  view  provides  only  partial  information  of  the  object,  it  is  likely  that  any  part 
of  the  object  will  be  observed  a  number  of  times  along  the  sequence.  Object  modeling 
from  this  sequence  of  views  can  be  formulated  as  an  overdetermined  minimization  prob¬ 
lem  because  significant  redundancy  exists  among  all  the  views. 

1.1  Past  work 

Much  work  has  been  done  in  object  modeling  from  a  sequence  of  range  images  [4],  Most 
work  assumed  that  transformation  between  successive  views  is  either  known  or  can  be 
recovered,  so  that  all  data  can  be  transformed  to  a  fixed  coordinate  system.  For  example, 
Bhanu  [3]  rotated  the  object  through  known  angles.  Ahuja  and  Veenstra  [  1  ]  constructed  an 
octree  object  model  from  orthogonal  views.  Soucy  and  Laurendeau  [16]  proposed  to  trian¬ 
gulate  each  view  and  merge  multiple  views  via  a  Venn  diagram  when  the  transformation  is 
known.  Because  building  a  Venn  diagram  is  combinatorial,  only  four-view  merging  is  pre- 
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sented  in  their  work.  By  finding  the  correspondences  from  intensity  patterns  in  all  eight 
views,  Vemuri  and  Aggarwal  [21]  derived  the  inter- frame  motion  and  transformed  all  eight 
range  images  to  the  first  frame.  Feme  and  Levine  [9]  merged  multiple  views  using  corre¬ 
spondence  points  which  are  identified  by  correlation  over  the  differential  properties  of  the 
surface.  Parvin  and  Medioni  [12]  proposed  to  construct  boundary  representation  (B-rep) 
object  models  from  unregistered  multiple  range  images.  Each  view  of  the  object  is  repre¬ 
sented  as  an  adjacency  graph  where  nodes  represent  surface  patches  with  attributes  and  arcs 
representing  adjacency  between  surfaces.  To  merge  any  two  views,  a  rigid  transformation 
has  to  be  computed  accurately.  Most  of  previous  approaches  to  modeling  from  a  sequence  of 
views  are  sequential.  Thus,  transformation  errors  accumulate  and  propagate  from  one 
matching  to  another,  which  may  result  in  imprecise  object  models. 

Inferring  scene  geometry  and  camera  motion  from  a  sequence  of  intensity  image  is  also  pos¬ 
sible  in  principle.  For  example,  Tomasi  and  Kanade  [19]  proposed  a  factorization  method  to 
simultaneously  solve  shape  and  motion  under  orthography,  and  Poelman  and  Kanade  [  1 3] 
extend  it  to  the  case  of  paraperspective  projection.  Szeliski  and  Kang  [18]  proposed  a  non¬ 
linear  optimization  method  to  solve  shape  and  motion  under  perspective.  However,  in 

[19] [13],  the  task  is  formulated  as  a  least  squares  problem  where  missing  data  due  to  occlu¬ 
sion  and  mismatching  is  extrapolated  from  measured  data  and  estimated  motion.  Although 
three  views  of  four  points  is  theoretically  sufficient  in  determining  structure  and  motion 

[20] ,  it  is  difficult  in  practice  to  find  a  good  submatrix  to  do  “row-wise”  and  “column-wise” 
extrapolation.  Szeliski  and  Kang  [18]  proposed  to  assign  a  weight  to  each  measurement  and 
incorporated  an  object-oriented  perspective  projection  in  a  nonlinear  least  squares  problem. 
The  very  nature  of  the  nonlinear  least  squares  formulation  requires  standard  techniques  in 
nonlinear  optimization,  e.g.,  Levenberg-Marquardt,  in  which  convergence  to  a  local  mini¬ 
mum  may  be  a  problem.  In  addition,  most  existing  algorithms  seem  to  be  more  useful  for 
determining  camera  motion  than  for  building  3D  object  models  because  the  recovered 
object  shape  is  defined  by  a  collection  of  3D  points  whose  connectivity  is  not  explicitly 
known. 

The  factorization  method  [5][13][19],  in  essence,  is  principal  component  analysis  of  some 
measurement  matrix.  Principal  component  analysis  expresses  the  variance  of  the  measure¬ 
ment  matrix  in  a  compact  and  robust  way  and  has  been  extensively  studied  in  computational 
statistics  [7],  The  singular  value  decomposition  (SVD)  method  [10]  is  a  straightforward 
solution  when  the  measurement  matrix  is  complete.  When  data  is  incomplete  or  missing,  as 
often  the  case  in  practice,  principal  component  analysis  becomes  much  more  complicated. 
Ruhe  [15]  first  proposed  a  Gauss-Newton  algorithm  to  solve  this  problem,  taking  advantage 
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of  the  sparse,  structured  derivatives  of  the  object  function.  Wiberg  [22]  generalized  Ruhe’s 
work  to  the  cases  where  the  rank  of  the  measurement  matrix  is  known.  However,  Wiberg’s 
algorithm  requires  solving  large  pseudo-inverse  matrices. 

1,2  Our  approach  to  multiple  view  merging 

We  propose  to  build  object  models  from  a  sequence  of  range  images.  Our  approach  is  to 
recover  bounding  surfaces  and  transformations  simultaneously  by  employing  principal  com¬ 
ponent  analysis  with  missing  data.  After  segmenting  ail  range  images  and  establishing  cor¬ 
respondences  among  different  views,  a  composite  graph  is  built.  The  object  surface 
description  and  transformations  among  different  views  are  recovered  by  solving  a  combina¬ 
tion  of  two  weighted  least  squares  (WLS)  problems. 

There  are  two  key  observations  to  our  approach  of  object  modeling  from  a  sequence  of 
views.  First,  because  of  the  redundancy  in  the  sequence  of  images,  we  can  get  a  reliable 
solution  from  an  overconstrained  minimization  problem  even  when  data  is  missing.  Because 
only  part  of  the  object  is  visible  in  each  view  we  cannot  find  correspondences  among  all  sur¬ 
faces  between  two  views.  Therefore,  this  is  not  a  least  squares  (LS)  problem  but  a  WLS  one 
where  the  weights  are  zeros  for  invisible  regions.  The  difficulty  is  how  to  formulate  the 
WLS  problem  properly  and  how  to  solve  this  problem  without  resorting  to  extrapolation  of 
the  unknowns.  We  present  an  algorithm  to  iteratively  update  the  surface  description  and 
transformation  so  that  the  weighted  least  squares  error  is  minimized. 

The  second  observation  lies  in  the  first  WLS  problem  of  recovering  surface  normals  and 
rotation  matrices.  The  modeling  problem  can  be  decomposed  into  two  smaller  problems 
because  recovering  rotation  is  independent  of  translation.  If  we  directly  apply  the  WLS 
algorithm,  we  have  to  explicitly  update  nine  parameters  of  every  rotation  matrix.  It  is  well- 
known  that  the  rotation  matrix  is  a  nonlinear  function  of  only  three  independent  parameters. 
Therefore,  updating  nine  parameters  (even  with  proper  normalization  afterwards)  is  not  the 
best  way  to  solve  this  problem.  We  solve  this  problem  by  representing  rotations  using 
quaternions. 

To  make  an  object  model  from  a  recovered  set  of  surface  equations,  spatial  connectivity 
among  surfaces  has  also  to  be  recovered.  Spatial  connectivity  refers  to  the  spatial  relation¬ 
ship  among  surfaces,  i.e.,  for  each  surface,  what  other  surfaces  it  is  connected  to.  The  prob¬ 
lem  of  surface  connectivity  is  reduced  to  one  of  connectivity  of  supporting  lines  of  a  simple 
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polygon,  solved  by  a  modified  Jarvis’  inarch  algorithm  that  combines  information  on  both 
algebraic  level  and  signal  level. 

U  Organization  of  paper 

In  section  2  we  discuss  principal  component  analysis  when  data  is  missing.  From  a  motiva¬ 
tional  example  of  modeling  12-faced  polyhedra  from  a  sequence  of  views,  we  formulate  the 
multiple  view  merging  as  a  problem  of  principal  component  analysis  with  missing  data 
(PC  AMD).  Then  we  outline  Wiberg’s  formulation  of  PC  AMD,  and  modify  the  formulation 
by  proper  indexing  of  the  objective  function.  The  modified  formulation  is  then  generalized 
to  a  WLS  minimization  problem.  An  efficient  PCAMD  algorithm  is  presented  to  solve  this 
WLS  problem.  In  section  3  we  formulate  modeling  the  object  and  recovering  transforma¬ 
tions  as  a  combination  of  two  WLS  problems.  We  compute  the  surface  description  and 
transformation  by  extracting  the  principal  components  of  two  highly  rank-deficient  mea¬ 
surement  matrices  with  many  missing  elements,  each  of  which  forms  a  WLS  problem.  The 
first  WLS  problem  of  recovering  rotation  matrices  and  surface  normals  is  further  simplified 
using  the  quaternion  representation  of  rotation.  A  two-step  algorithm  is  presented  to  model 
the  object  from  a  sequence  of  segmented  range  images.  Section  4  gives  a  brief  description  of 
a  surface  patch  tracking  system.  Different  modules  in  the  tracking  system,  range  image  seg¬ 
mentation,  adjacency  graph  building,  and  two-view  merging  are  presented.  In  section  5  we 
show  that  the  problem  of  surface  connectivity  can  be  reduced  to  one  of  connectivity  of  sup¬ 
porting  lines  of  a  simple  polygon.  Since  the  problem  of  establishing  connectivity  of  support¬ 
ing  lines  can  be  regarded  as  both  a  modified  convex  hull-like  problem  and  a  cell 
decomposition  problem,  we  propose  a  modified  Jarvis’  march  algorithm  which  successfully 
reconstruct  the  simple  polygon.  We  demonstrate  applicability  and  robustness  of  proposed 
PCAMD  method  by  applying  our  approach  to  synthetic  data  and  real  range  images  in  Sec¬ 
tion  6.  We  show  that,  given  correspondence,  four  views  are  necessary  to  recover  the  shape 
of  an  arbitrary  dodecahedron  (12-face  polyhedral  A  study  on  synthetic  data  of  a  dodecahe¬ 
dron  shows  that  our  approach  is  robust  with  respect  to  noise  and  surface  mismatching.  Our 
method  yields  a  statistically  optimal  model  for  a  given  set  of  views;  this  method  improves  as 
more  views  are  incorporated.  From  a  sequence  of  real  range  images,  a  polyhedral  object 
model  is  precisely  recovered  using  the  proposed  method.  A  complex  toy  house  mode!  is  also 
reconstructed  from  a  sequence  of  range  images.  Final  comments  and  conclusions  are  pre¬ 
sented  in  Section  7. 
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2  Principal  Component  Analysis  with  Missing  Data 

2.1  Motivational  example 

Suppose  that  our  task  is  to  make  a  model  for  a  dodecahedron  (/2-faced  poiyhedra)  from  a 
sequence  of  segmented  range  images.  A  dodecahedron  is  a  simple  and  interesting  Platonic 
solid.  Assume  that  we  have  tracked  12  faces  over  4  nonsingular  views.  The  segmented 

range  images  provide  trajectories  of  plane  coordinates  {p^|  /=/ . 4.  P=i . 12),  where 

p  »  (vr, d)T  represents  a  planar  equation  with  surface  normal  v  and  normal  distance  to  the 
origin  d.  Then  we  may  form  a  i6x  12  measurement  matrix  as  follows: 
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where  every  *  indicates  an  unobservable  face  since  there  are  only  six  visible  faces  from  each 
nonsingular  view.  Our  modeling  task  is  now  to  recover  the  poses  of  all  the  12  faces  in  a 
fixed  coordinate  system. 


V3  V4 


Figure  1  Distinct  views  of  a  dodecahedron 
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If  the  measurement  matrix  were  complete,  our  task  would  be  to  average  all  those  12  faces 
over  4  views  assuming  data  is  noisy.  In  the  absence  of  noise,  any  set  of  12  faces  from  one  of 
4  views  will  do.  The  standard  way  to  solve  this  problem  is  to  apply  SVD  to  this  measure¬ 
ment  matrix,  whose  rank  is  at  most  4  (see  section  3.1  for  the  argument).  The  measurement 
matrix  can  subsequently  be  factorized,  with  proper  normalization,  into  a  left  matrix  Q  of 
transformation  parameters  and  a  right  matrix  P  of  plane  coordinates 

W  =  Q  P 


and  is  the  transformation  of fth  view  with  respect  to  the  fixed  world  coordinate  system, 
and  pp  is  the  pth  plane  equation  in  the  same  world  coordinate  system.  Singular  value 
decomposition  has  also  been  successfully  applied  to  shape  and  motion  recovery  from  a 
sequence  of  intensity  images  [19]. 

Unfortunately,  the  measurement  matrix  is  often  incomplete  in  practice;  it  is  not  unusual  for  a 
large  portion  of  the  matrix  to  be  unobservable.  As  we  have  seen  in  the  above  example,  half 
of  the  measurement  matrix  is  unknown.  When  the  percentage  of  missing  data  is  very  small, 
it  is  possible  to  replace  the  missing  elements  with  the  mean  or  an  extreme  value;  this  is  a 
common  strategy  in  multivariate  statistics  [7].  However,  such  an  approach  is  no  longer  valid 
when  a  significant  portion  of  the  measurement  matrix  is  unknown. 

One  common  practice  in  modeling  from  a  sequence  of  images  is  to  use  extrapolation.  For 
example,  we  can  recover  the  transformation  between  view  1  and  view  2  if  there  are  at  least 
three  matched  planar  surfaces  that  are  non-parallel  [8].  Then  we  extrapolate  the  invisible 
planar  surfaces  in  view  1  from  its  corresponding  surfaces  in  view  2  which  are  visible  using 
the  transformation  recovered.  And  apply  the  same  extrapolation  to  invisible  surfaces  in  view 
1.  By  repeating  this  process  we  can  in  principle  extrapolate  the  locations  of  all  invisible  sur¬ 
faces  from  visible  surfaces  [  12].  A  final  step  could  be  added  to  fine-tune  the  result  by  factor¬ 
izing  the  extrapolated  measurement  matrix  using  SVD#  A  similar  extrapolation  approach 
“propagation  method”  [19]  is  used  in  motion  and  shape  recovery  from  a  sequence  of  inten¬ 
sity  images. 
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The  major  problem  with  the  extrapolation  method  is  that  once  the  estimated  transformation 
is  incorrect  at  any  step,  the  extrapolated  results  will  be  erroneous.  In  sequential  modeling 
errors  accumulate  and  propagate  all  the  way.  The  fine-tuning  process  at  the  last  step  would 
not  improve  the  result  dramatically  since  the  extrapolated  measurement  matrix  is  inaccurate. 
To  obviate  this  problem,  we  make  use  of  more  rigorous  mathematical  tools  developed  in 
computational  statistics  that  caters  for  missing  data  without  resorting  to  error  sensitive 
extrapolation.  We  will  demonstrate  the  formulation  in  this  section  and  apply  it  to  multiple 
view  merging  in  the  next  section. 

2.2  Wiberg’s  formulation 

The  problem  of  object  modeling  from  a  sequence  of  views  shown  in  the  previous  section 
can  be  formulated  as  a  problem  of  principal  component  analysis  with  missing  data 
(PCAMD),  which  has  been  extensively  studied  in  computational  statistics.  Ruhe  [15]  pro¬ 
posed  a  minimization  method  to  analyze  one  component  model  when  observations  are  miss¬ 
ing.  One  component  model  decomposes  an  FxP  measurement  matrix  into  an  fxi  left 
matrix  and  a  I  xp  right  matrix.  Wiberg  [22]  extended  Ruhe’s  method  to  the  more  general 
case  of  arbitrary  component  model.  We  first  outline  Wiberg’s  formulation  of  principal  com¬ 
ponent  analysis  with  missing  data,  before  proposing  a  modified  formulation  by  appropriate 
indexing,  and  generalizing  the  problem  as  a  WLS  problem. 

Suppose  that  anfxP  measurement  matrix  W  consists  of  P  individuals  from  an  F-variate 
normal  distribution  with  mean  p  and  covariance  I.  Let  the  rank  of  W  be  r.  If  the  data  is 
complete  and  the  measurement  matrix  filled,  the  problem  of  principal  component  analysis  is 
to  determine  if,  s,  and  v  such  that 

W-e[LT-USV  I 

is  minimized,  where  u,  and  v  are  F  x  r  and  P  x  r  matrices  with  orthogonal  columns, 
s  =  diag  (op  is  an  r  x  r  diagonal  matrix,  p  is  the  maximum  likelihood  approximation  of  the 

mean  vector  and  er  =  ( / . /)  is  an  F-tuple  vector  with  all  ones.  The  solution  to  this  problem 

is  essentially  SVD  of  the  centered  (or  registered)  data  matrix  w-  en  T. 

If  data  is  incomplete,  we  have  the  following  minimization  r  blem: 

1  2 

min  0  =  2^^Wfp~VLp~uf.vp.)~  (EQ  n 


/  =  {  (f,  p) :  Wj-  p  is  observed} 
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where  Uy  and  vp  are  column  vector  notations  defined  by 

i 

=  US2  (EQ2) 

and 

=  VS 2  (EQ  3) 


Lemma  1 

A  necessary  condition  to  uniquely  solve  this  problem  (EQ  l)is  m>r(F+P-r)  +  P 
where  m  is  the  number  of  observable  elements  in  W. 

Proof: 

It  is  trivially  true  that  there  are  at  most  r(F+P-r)  independent  elements  from  LU 
decomposition  of  an  F  x  P  matrix  of  rank  r.  Hence,  to  uniquely  solve  (EQ  l),  the  number  of 
equations  (m)  has  to  be  no  fewer  than  the  number  of  unknowns  (r{F+P-r)+P)).  (Q.E.D.) 

To  sufficiently  determine  the  problem  (EQ  1)  more  constraints  are  needed  to  normalize 
either  the  left  matrix  U  or  the  right  matrix  V. 


If  we  write  the  measurement  matrix  W  as  an  m -dimensional  vector  w,  the  minimization 
problem  can  be  written  as 


min 


(EQ4) 


where 


f  =  w  -  p  -  Fu  =  w-Gv 


(EQ5) 
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and 


F  and  G  are  of  dimension  mxrF  and  m  x  (r  +  1) P  respectively,  and  are  computed  by 
expanding  every  element  y;  of  f 

fi  =  wi-^(i) -“/(«). vp(«). 

where  the  ith  component  of  w  is  indexed  to  the  (fii),  p(i))- th  component  of  W,  i.e., 
wi  =  WfU),pU)' 311(1  W/.p  =  wi (/./»)• 

To  solve  the  minimization  problem  stated  by  (EQ  4),  the  derivative  of  the  objective  function 
(with  respect  to  u  and  v)  should  be  zero,  i.e.. 


FtFu  -  FT  (w  -  (i) 
GTGv  -  Grw 


=  0 


(EQ6) 


Obviously  (EQ  6)  is  nonlinear  because  F  is  a  function  of  v  and  G  is  a  function  of  u.  In  the¬ 
ory,  any  appropriate  nonlinear  optimization  method  can  be  applied  to  solve  it.  However,  the 
dimensionality  is  so  high  in  practice  that  we  have  to  adapt  the  algorithm  to  make  use  of  the 
special  structure  of  the  problem.  It  can  been  observed  that: 


(1)  For  fixed  u,  we  have  a  linear  least  squares  problem  of  v;  for  fixed  v,  we  have  a  linear 
least  problem  of  u; 

(2)  since  (EQ  6)  is  also  a  bilinear  problem  of  u  and  v,  we  can  successively  improve  their 
estimates  by  using  the  updating  technique  in  the  NIPALS  algorithm  [15],  i.e.,  for  a  given 
v,  u  is  updated  u  =  F*  (w  -  }i);  for  a  given  u,  v  is  updated  v  =  G+w.  F*  and  G+  are  the 
pseudo-inverses  of  F  and  G,  respectively. 


23  Modified  Wiberg’s  formulation 

In  practice  F  and  G  are  usually  sparse  matrices  with  many  zeros.  If  we  appropriately  index 
w  as  Wj ,  such  that 
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f,  =  w ,  -  fi  -  Hu 
or 


(EQ7) 


(EQ8) 


(EQ9) 


Because  f ,  and  f2  contain  the  same  observables  as  f. 


__  ft  =  = 

2  2  2 


(EQ  II) 


and 
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Hrtfu-//r(w1-(i) 
KrKv  -  KT w2 


(HQ  12) 


dr.  df-, 

as-*--  af 


(EQ  13) 


If  data  is  complete,  K  is  a  block  diagonal  matrix  of  dimension  FP  x  ( r  +  1 )  P,  whose  block 
elements  are  U  matrices  of  dimension  F  x  (r  +  1 ) ,  replicated  along  the  diagonal,  i.e.. 


U  0  0 

*  =  0  ••  0 
0  o'  U 


(EQ  14) 


"l.rl 


(EQ  15) 


x  (r  +  I) 

0 

0 

"tf, 

0 

o' 

K  = 

0 

* 

0 

= 

0 

. 

0 

0 

o’ 

^mpx  (r+  1) 

0 

o' 

Ur. 

when  data  is  incomplete,  the  elements  associated  with  the  missing  data  are  taken  out,  result¬ 
ing  in  a  matrix  of  dimension  m  x  (r+  \)P 


(EQ  16) 


where 

p  F 

XmP  =  m' md  mn  =  ILlf.P' 

p= i  /=! 

7 f  p  —  1  when  Wjp  is  observed  otherwise  Yy  p  =  0. 

Similarly,  when  data  is  incomplete,  we  have  the  following  matrix  of  dimension  mx  rF 


0 

0 

V, 

0 

o' 

• 

0 

= 

0 

0 

0 

v 

nFxr 

0 

o’ 

VF 

(EQ  17) 
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where 


F 

£  rif  -  m,  and  ny  = 
/=  i 


p 


p  =  i 


The  pseudo-inverse  matrices  of  H  and  K  can  be  easily  computed  because  of  their  block 
diagonal  structure. 


ft  = 


V\  0  0 

0  •  0 
_0  o'  ltpJ 


ft  = 


V\  0 

0  •- 


0 

0 


0  0  Vj| 


(EQ  18) 


(EQ  19) 


2.4  WLS  formulation 

The  minimization  problem  (EQ  1)  can  be  generalized  as  a  WLS  problem 

min  <D  =  5^  <7 f,p(Wf,p-Vip-^p))2  (EQ  20) 

fp 

where  is  weighting  factor  for  each  measurement  WftP. 

In  the  previous  discussion,  we  have  assumed  that  all  weights  are  either  one  when  data  is 
observable  or  zero  when  unobservable.  However,  in  many  cases  we  may  prefer  to  assign  dif¬ 
ferent  weights  other  than  ones  or  zeros  to  individual  measurement.  For  example,  in  recover¬ 
ing  the  pose  of  a  3D  plane,  we  can  assign  the  confidence  measurement  to  each  recovered 
surface  normal  by  its  incidence  angle  with  the  viewing  direction.  Different  sensor  models 
can  be  applied  to  obtain  a  weighting  matrix  if  necessary.  In  the  following,  we  formulate 
principal  component  analysis  with  missing  data  as  a  WLS  problem. 

We  introduce  two  FP  x  FP  diagonal  weight  matrices, 

r  *  diag(rrr2,...,rP)  <eq2d 
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O  =  diag(&r<b2 . Op 


(EQ  22) 


where 


f p  =  diag  (Yp(  j»  Yy,t 2>  •••»  Ypj  p » P  ~  •  ••» 


Oy  diag  ( Y i.y»  Y^y*  Y /*,/ )  ’  f  ~ 


The  minimization  problem  becomes 


min  <j>  = 


tT  f  fT  m 

*yl  *yI  ry2  Iy2 


(EQ  23) 


where 


•V,  =  T  f,  and  =  <b  f2. 


The  solution  to  the  above  problem  is  when  the  first  order  derivative  of  the  objective  function 
becomes  zero.  The  derivative  of  the  objective  function  is 


KyKyv-KyW2 


(EQ  24) 


where 


Hy  =  FH  = 


(EQ  25) 


Ky  =  <t>K  = 


*Fu 


(EQ  26) 


Therefore,  after  computing  the  pseudo  inverses  of  Hy  and  Ky 
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(vrr[r,v)  1  (vrr[) 

(v7’rJr/>vT1  ( vTrTp) 

(HQ  27) 


4 

(UT^{U)~\uT<J>Tl) 

(UT<t>TF<i>FU)~'  (f/rOj) 

(EQ  28) 


we  can  then  use  the  PC  AMD  (principal  component  analysis  with  missing  data)  algorithm  to 
solve  the  WLS  problem.  Our  formulation  is  essentially  a  modified  NIPALS  Ruhe-Wiberg 
algorithm.  The  algorithm  is  as  follows: 


Algorithm  PCAMD 

(1)  initialize  v 

(2)  update 

u  =  (wj-fi) 

(3)  update 

v  =  <w2 

(4)  stop  if  the  algorithm  converges,  or  go  back  to  (2). 


Remarks: 

(1)  Ruhe  [15]  also  suggested  using  Newton  and  Gauss  methods  to  speed  up  the 
convergence  of  the  NIPALS  method.  In  practice,  we  found  that  the  NIPALS  method 
converges  within  desired  tolerance  in  several  iterations  in  our  experiments. 

(2)  Ruhe  [15]  and  Wiberg  [22]  also  showed  that  the  more  the  missing  data,  the  worse  the 
result  will  be.  It  is  hardly  surprising  because  the  method  is  basically  an  interpolation 
of  all  observable  elements.  Statistically  this  corresponds  to  decreasing  robustness  of 
the  estimate  for  the  principal  components  given  the  observations.  Fortunately  object 
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modeling  from  multiple  views,  we  can  always  take  many  views  to  form  a  well 
constrained  problem  for  our  modeling  purpose.  Determining  a  minimally  acceptable 
number  of  views  can  be  regarded  as  a  sensor  planning  problem. 

(3)  The  missing  data  can  also  be  extrapolated  as  long  as  we  find  some  sub-blocks  in  the 
measurement  matrix  which  satisfy  Lemma  1.  The  issue  of  obtaining  those  blocks  is 
non-trivial.  Once  the  missing  data  have  been  augmented,  a  linear  or  nonlinear 
optimization  method  can  be  applied  to  solve  the  original  problem.  It  should  work 
well  if  data  is  noise-free,  i.e.,  only  the  first  r  singular  values  of  the  reconstructed 
measurement  matrix  are  non-zero.  However,  this  method  becomes  of  questionable 
utility  when  any  result  from  sub-block  computation  is  inaccurate. 

(4)  There  are  statistical  ways  to  improve  the  solution,  for  example,  the  metrically 
Winsorised  residuals  method  [18].  This  method  is  based  on  the  assumption  that  each 
measurement  is  corrupted  by  additive  Gaussian  noise.  The  metrically  Winsorised 
residuals  method  adjusts  the  weight  for  each  measurement  depending  on  its  residual 


error. 


3  Merging  Multiple  Views 
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Principal  component  analysis  with  missing  data  has  been  formulated  to  a  WLS  minimiza¬ 
tion  problem  in  the  previous  section  and  a  PCAMD  algorithm  is  proposed  to  solve  it.  From 
the  motivational  example  of  a  dodecahedron  it  is  clear  that  object  modeling  from  a  sequence 
of  views  should  be  formulated  as  a  WLS  problem. 

In  this  section,  we  show  that  multiple  view  merging  can  be  formulated  as  a  combination  of 
two  WLS  problems.  The  first  WLS  problem  involves  rotation  matrices  and  surface  normals, 
which  is  independent  of  translation.  Once  the  first  problem  is  solved,  the  second  WLS  prob¬ 
lem  yields  translation  vectors  and  normal  distances  to  the  origin.  The  first  WLS  problem  of 
determining  rotation  matrices  and  surface  normals  can  be  further  simplified  by  representing 
the  rotation  matrix  using  the  quaternion.  A  straightforward  two-step  iterative  algorithm  can 
be  devised  to  solve  these  two  problems  using  the  PCAMD  algorithm  from  the  previous  sec¬ 
tion. 


3.1  Two  WLS  problems 

Suppose  that  we  have  tracked  P  planar  regions  over  F  frames.  We  then  have  trajectories 

of  plane  coordinates  { (v/p,  dfp)  |  /=  1 . F,p=\ . P)  where  v/p  is  the  surface  normal  of  the 

pth  patch  in  the  fth  frame,  and  dfp  is  the  associated  normal  distance  to  the  origin.  To  facilitate 
the  decomposability  of  rotation  and  translation,  instead  of  forming  a  4 FxP  measurement 
matrix  as  in  section  2.1,  we  form  surface  normals  vf  into  a  3 FxP  matrix  w,vl  and  dis¬ 
tances  dfp  into  an  FxP  matrix  wid) .  w(v)  and  wiU)  are  called  the  normal  measurement 
matrix  and  distance  measurement  matrix  respectively. 

It  can  be  easily  shown  that  wlv)  has  at  most  rank  3  and  wtJ)  has  at  most  rank  4  when  noise- 
free,  therefore,  w<v)  and  wid)  are  highly  rank-deficient.  We  decompose  w(v)  into 

W(v,)  =  R  V  (EQ  29) 


R  = 


r(F) 


where 
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is  the  rotation  matrix  of  each  view  with  respect  to  the  world  coordinate  system,  and 

v  *  [v, . v,i  is  the  surface  normal  matrix  in  the  world  coordinate  system.  Since  R  is  a 

3F  x  3  matrix  and  V  is  an  3  x  P  matrix,  the  rank  of  w(v)  is  at  most  3. 

Similarly,  we  can  decompose  wid)  into 


where 


(EQ  30) 


tj  and  Rf  are  the  translation  vector  and  rotation  matrix  of  view  /  with  respect  to  a  fixed 
world  coordinate  system. 


Note  that  decomposition  of  w(<0  depends  on  the  decomposition  of  w(v) .  Since  M  is  4  xP 
and  T  is  F  x  4,  the  rank  of  w,<‘/)  is  at  most  4. 


We  can  also  decompose  wid)  into 


(EQ  3 1 ) 


When  all  elements  in  the  two  measurement  matrices  are  known,  we  need  to  solve  two  least- 
squares  problems.  However,  since  only  part  of  the  planar  regions  are  visible  in  each  view, 
we  end  up  with  two  WLS  problems  instead.  The  first  least  squares  problem,  labeled  as 
WLS-1,  is 


/=/ . F,p-l,...,P 


(EQ  32) 


and  the  second  one,  denoted  as  WLS-2,  is 


/=/ . F,p=l . P 


(EQ  33) 


where 
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v,  *  0  if  surface  p  is  invisible  in  frame  f,  and  yf  =  1  otherwise.  All  weights  can  be  any 
9 JtP  J»P 

number  between  zero  and  one,  depending  on  the  significance  or  confidence  of  each  mea¬ 
surement.  A  similar  WLS  formulation  is  also  used  in  [18]. 

3.2  Quaternion  WLS-1 

It  appears,  from  the  last  section,  that  we  can  devise  a  naive  two-step  algorithm  which  solves 
WLS-1  and  subsequently  WLS -2  by  applying  the  PC  AMD  algorithm  to  both  problems. 
However,  in  order  to  solve  WLS-1,  we  iterate  as  if  it  had  nine  independent  parameters, 
while  it  is  a  nonlinear  trigonometric  function  of  three  parameters.  Although  it  is  possible  to 
normalize  after  every  iteration,  it  may  perform  poorly  in  terms  of  robustness  and  effi¬ 
ciency. 

In  fact,  several  representations  of  rotation  are  often  used  in  practice: 

(1)  An  orthonormal  rotation  matrix  R 

(2)  An  rotation  axis  a  and  an  rotation  angle  6 

(3)  A  unit  quaternion  q 

A  quaternion  is  a  4-tuple  (w,s)  where  w  is  a  3-vector  and  s  is  a  scalar.  The  mapping  between 
a  unit  quaternion  and  a  rotation  axis  along  with  a  rotation  angle  is  given  by  w  =  sin  ( 0/2 )  a 
and  s  =  cos  (0/2).  The  quaternion  representation  of  the  rotation  matrix  leads  to  a  simple 
way  of  solving  minimization  problems  of  3D  point  matching  and  surface  normal  matching, 
as  demonstrated  in  [8]. 


The  WLS-1  problem  (EQ  32)  can  be  decomposed  into  F  minimization  problems 

p 

min  S^/V)  Sw/-/?t/)vJ  (EQ34) 

p  =  i 

where 


/  =  1,  .  ,F,  v/f  =  [Wf  ,, 


wf  />]  T,  yjv)  =  Jiag(Yf],...,yfp). 


The  above  problem  can  be  reformulated  using  quaternion  as 
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min 


p  -  * 


(EQ  35) 


subject  to  \q\  =  1 

where  q  is  the  conjugate  quaternion  of  q,  and 


p  =  i 


=  X  T/w>Iwy*(/) =  X 


(EQ  36) 


P  =  1 


P  -  I 


are  synunetric  matrices  because  ny?  ^  -  q  ^  vp  is  a  linear  function  of  q  ^ .  Obviously 

=  x  AT  <EQ37) 

/>  =  i 

is  also  symmetric,  and  the  minimization  problem  ()  becomes 

min  q w  q^  (EQ38) 

The  solution  to  the  above  minimization  problem  is  the  eigenvector  q^n  corresponding  to 
the  minimum  eigenvalue  of  the  matrix  . 

3 3  Iterative  algorithm 

We  combine  quaternion-based  rotation  matrix  updating  to  form  a  two-step  algorithm  to 
solve  both  the  first  and  the  second  WLS  problems.  The  algorithm  is  as  follows: 


Algorithm  two-step  WLS’s 
Step  0  Initialization 

(0.1)  read  in  measurement  matrices  W ( v),  W(<^ 
(0.2)  read  in  weight  matrices  y(v) ,  y(<*) 

(0.3)  initialize  R,  vectorize  R  to  v 
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Step  1  WLS-1 


(1.1)  vectoriieW(v)  towvjandwy2 

(1.2)  update 
(1J)  update 


(1.4)  update 


u 


vl 


B 


(ft 


(li)  update 


and  transform  to  R,  vectorize  to  v 
(1.6)  go  to  (12)  if  not  converged,  otherwise  advance  to  Step  2 


Step  2  WLS-2 


(2.1)  vectorize  W(<*)  to  w^j  and 
(2J)  update  HjY 
(23)  update 


(2.4)  update 
(23)  update 


u  =  H 


dywdl 


'  =  Kdr"d2 

(2.6.)  stop  if  converged,  otherwise  go  to  (2.2). 


We  have  not  explicitly  discussed  the  normalization  problem  in  our  WLS  approach.  The  nor¬ 
malization  problem  occurs  because  the  measurement  matrix  is  rank-deficient,  hence,  there 
are  infinite  solutions  to  the  minimization  problem  (EQ  1)  unless  an  additional  constraint  is 
imposed.  This  additional  constraint  is  generally  problem  dependent;  for  example,  the  2- 
norm  of  the  factorized  left  matrix  is  constrained  to  be  one  [15].  Fortunately,  we  have  implic¬ 
itly  constrained  our  rotation  matrices  in  its  quaternion  representation.  The  remaining  con¬ 
straint  in  the  first  WLS  is  the  normalization  of  surface  normal  vectors  which  are  constrained 
to  be  of  unit  magnitudes. 

Prior  to  multiple  view  merging,  we  need  to  track  surfaces  so  that  normal  measurement 
matrix  and  distance  measurement  matrix  can  be  formed.  The  next  section  describes  our  sur¬ 
face  tracking  algorithm. 
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4  Surface  Patch  Tracking 

In  this  section,  we  briefly  overview  each  module  of  our  surface  patch  tracking  system:  range 
image  segmentation,  adjacency  graph  building,  and  two- view  matching. 

4.1  Range  image  segmentation 

There  are  many  different  techniques  for  range  image  segmentation.  By  and  large  they  can  be 
divided  into  feature-based  and  primitive-based  approaches,  although  statistics-based 
approaches  have  also  been  introduced  recently.  Feature-based  approaches  yield  precise  seg¬ 
mentation  but  are  sensitive  to  noise  in  practice.  For  example,  Gaussian  and  mean  curvatures 
can  be  used  to  label  different  regions  before  region  growing;  however,  this  process  is  quite 
sensitive  to  noise  because  of  the  second  order  derivative.  Primitive-based  approaches  are 
more  robust  to  noise  but  constrained  by  the  number  of  primitives.  The  higher  the  degree  of 
surface  polynomial,  the  more  difficult  and  the  less  robust  the  segmentation  is  likely  to  be. 

We  have  used  the  primitive-based  region  growing  segmentation  method  of  [8].  The  two 
types  of  surface  primitives  used  are  the  planar  surface  and  the  quadric  surface.  The  regions 
are  established  via  region  growing  from  seed  points,  i.e.,  the  seed  points  are  chosen  from 
points  which  are  closest  to  their  approximating  primitives,  and  then  merged  with  their 
neighbors  until  the  best-fit  errors  become  unacceptable. 

4.2  Adjacency  graph 

Once  we  have  successfully  segmented  the  range  data  for  each  view,  the  range  image  associ¬ 
ated  with  view  i  can  be  represented  as  a  set  of  planar  regions  /(  =  {  vtj,  dijy  c-y},  where  vtJ 
and  dr  are  the  normal  and  distance  of  the  jth  segment  planar  surface  respectively,  and  cl}  is 
the  centroid  of  jth  segmented  region. 

From  each  view  of  the  3D  object,  we  build  an  adjacency  graph  where  every  node  in  the 
graph  represents  a  visible  planar  region  and  each  arc  connects  two  adjacent  nodes.  The  adja¬ 
cency  graph  is  updated  whenever  this  view  is  matched  with  another.  Eventually  we  have 
adjacency  information  among  all  visible  planar  regions  after  tracking  all  of  them  for  the 
whole  sequence.  From  the  adjacency  graph,  all  the  object  vertices  can  be  located;  thus,  3D 
object  model  is  obtained.  However,  augmenting  adjacency  graph  is  difficult  for  concave 
objects  because  of  occlusion.  A  better  way  of  establishing  spatial  connectivity  among  all 
surfaces  is  discussed  in  section  5. 
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We  have  implemented  the  planar  surface  patch  tracking  system  which  employs  a  wavefront 
algorithm  to  generate  the  adjacency  graph.  The  wavefront  algorithm  makes  use  of  range 
data  because  there  is  significant  change  in  range  data  across  an  occluding  edge. 

43  Matching  two  views 

Given  two  adjacent  segmented  images  /,  and  /2,  we  would  like  to  find  correspondence 
between  different  regions  in  two  views,  i.e.,  we  want  to  find  a  mapping  <f>:  (/,  — » /2)  such 

that  a  certain  distance  measurement  d  (/,,  /2)  is  minimized. 

Two  questions  arise  in  matching  two  views  of  planar  regions.  The  first  is  how  to  make  corre¬ 
spondence  between  two  views;  the  second  is  how  to  recover  the  transformation  between 
them.  Our  solution  to  the  first  problem  is  to  use  adjacency  information  between  two  seg¬ 
mented  patches  and  between  segmented  surface  normals.  If  displacement  between  two 
views  is  relatively  small,  there  should  be  only  linear  shape  change  [11]  within  the  same 
aspect,  corresponding  segmented  regions  are  of  similar  size  (number  of  points),  centroid, 
and  surface  normals.  When  a  new  aspect  appears,  which  signals  a  nonlinear  shape  change, 
there  would  be  significant  change  in  these  parameters.  There  may  not  always  be  solutions  to 
the  second  problem  because  we  need  at  least  two  corresponding  non-parallel  faces  to  deter¬ 
mine  rotation  and  three  to  determine  translation.  In  practice,  we  can  make  the  assumption 
that  we  always  have  two  non-parallel  corresponding  faces  in  two  adjacent  views. 

In  fact,  solving  the  second  problem  can  be  of  help  to  the  first  problem  because  we  can  then 
make  use  of  the  hypothesis-and-test  approach.  We  iteratively  select  two  pairs  of  non-parallel 
faces  from  the  two  images  to  be  matched,  estimate  the  corresponding  rotation  matrix,  and 
then  attempt  to  match  the  rest  of  the  faces.  We  always  choose  two  adjacent  faces  from  both 
images,  and  match  them  based  on  surface  normal,  distance  and  centroid  of  segment  regions. 
The  number  of  faces  matched  and  consistency  in  face  adjacency  are  used  in  the  distance 
measure  between  two  matches.  The  estimated  transformation  matrix  is  only  used  to  help 
building  the  adjacency  graph,  while  the  precise  transformation  is  robustly  recovered  from 
our  WLS  method. 

Multiple  view  tracking  is  done  by  sequentially  matching  two  adjacent  views.  Whenever  a 
new  view  is  added,  the  adjacency  graph  and  the  weight  matrix  are  automatically  modified. 
Because  of  the  problems  associated  with  updating  adjacency  graph,  subsequent  to  surface 
patch  tracking  and  multiple  view  merging,  we  use  another  algorithm  to  establish  the  spatial 
connectivity  among  surfaces. 
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5  Spatial  Connectivity 

Once  we  have  extracted  the  equations  of  planar  surfaces  of  the  object,  we  then  need  to 
establish  spatial  connectivity  relationship  among  these  surfaces.  One  approach  is  to  build 
adjacency  graph  from  a  sequence  of  views,  as  discussed  in  previous  section.  However,  aug¬ 
menting  adjacency  graph  whenever  a  new  view  is  introduced,  is  quite  ad-hoc.  In  this  sec¬ 
tion,  we  present  a  new  approach  of  recovering  surface  connectivity  after  all  surface  patches 
are  recovered.  We  show  that  the  problem  of  spatial  connectivity  of  boundary  surfaces  can  be 
reduced  to  one  of  connectivity  of  supporting  lines  of  a  simple  polygon. 

5.1  Half-space  intersection  and  union 

We  assume  that  every  planar  patch  P  of  object  model  is  a  simple  polygon.  A  simple  polygon 
does  not  self-intersect.  Every  (infinite)  plane  divides  the  space  into  two  parts,  inside  and  out¬ 
side,  with  surface  normal  pointing  towards  the  external  side  of  the  object.  Given  an 
unbounded  planar  surface,  if  we  intersect  all  other  planar  surfaces  on  it,  we  obtain  support¬ 
ing  lines  as  illustrated  in  Fig.2.  Each  supporting  line  is  directed  so  that  the  interior  of  P  lies 
locally  to  its  right.  The  right  half-plane  created  by  such  a  directed  supporting  line  c  is  called 
the  supporting  half-plane,  and  is  characterized  as  supporting  the  polygon  [6];  however,  a 
concave  P  might  not  all  lie  in  the  right  half-plane  as  indicated  in  Fig.2. 


Figure  2  A  simple  polygon  and  its  supporting  lines  (stippled  and  solid  lines) 
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For  each  point  x  in  the  plane,  if  we  know  which  side  of  each  supporting  line  x  lies  on,  we 
know  if  x  is  inside  P.  Therefore,  the  polygon  P  (and  its  interior)  can  be  represented  as  a  bool¬ 
ean  formula  whose  atoms  are  those  supporting  lines.  In  other  words,  a  simple  polygon  can 
be  represented  by  intersection  and  union  of  its  supporting  line.  For  example,  a  boolean  for¬ 
mula  for  the  polygon  in  Fig.2  can  be  ca  ©  ab  ©  bd  ©  dc.  This  Guibas  style  [6]  formula  is 
obtained  by  complementing  the  second  supporting  line  at  a  convex  angle,  and  the  first  sup¬ 
porting  line  at  a  concave  angle  when  we  go  around  the  polygon.  Other  boolean  formulas 
such  as  Peterson  style  are  also  possible  [6].  Once  spatial  connectivity  is  established,  the 
Guibas  style  formula  is  straightforward. 


5.2  Modified  Jarvis’  march 

The  problem  of  establishing  spatial  connectivity  of  supporting  lines  can  be  formulated  as  a 
modified  convex  hull-like  problem  which  involves  only  vertices.  This  problem  can  also  be 
regarded  as  one  of  cell  decomposition  which  involves  data  points.  We  propose  a  modified 
Jarvis’  march  algorithm  to  reconstruct  simple  polygons  from  supporting  lines  and  valid  data 
points.  The  algorithm  to  recover  spatial  connectivity  among  3D  surfaces  is  discussed  in  sec¬ 
tion  5.3. 


Definition  1 

A  point  is  defined  as  valid  in  a  simple  polygon  if  there  exist  sufficient  valid  data  points  around 
its  neighborhood. 


Lemma  2 

The  intersection  point  P  of  two  supporting  lines  is  valid  in  a  simple  polygon  if  and  only  if  the 
intersection  of  two  corresponding  half-planes  is  valid  locally  at  P. 

Proof: 

When  the  intersection  of  two  half-planes  is  valid  localty  at  P,  the  intersection  point  of 
these  two  supporting  lines  is  valid  by  its  definition. 

Assume  that  the  intersection  point  of  two  supporting  lines  is  valid.  Since  two  lines  divide 
the  plane  into  four  regions,  there  must  exist  at  least  one  such  region  among  four  around  the 
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intersection  point  that  is  a  valid  cell  of  the  simple  polygon.  Therefore,  the  intersection  of 
two  half-planes  is  valid  locally  at  P.  (Q.E.D.) 

The  Lemma  2  leads  to  a  modified  Jarvis’  march  algorithm  of  reconstructing  simple  polygon 
from  supporting  lines  and  valid  data  points. 


To  construct  the  simple  polygon  from  all  supporting  lines  and  valid  data  points,  we  first  pre¬ 
compute  all  intersection  points  which  are  candidates  of  vertices  of  the  simple  polygon.  If  we 
march  successive  vertices  with  the  least  turning  angle,  we  obtain  their  convex  hull;  this  is 
referred  to  as  Jarvis’  march  algorithm  [14].  The  kernel  of  the  simple  polygon,  if  it  exists,  can 
also  be  found  by  intersecting  all  half-spaces.  Using  Lemma  2,  however,  enables  us  to  find 
the  correct  simple  polygon  by  marching  all  points  whose  local  neighborhood  is  valid.  We 
call  this  algorithm  the  “modified  Jarvis’  march”. 

Assume  that  we  have  first  found  the  lowest  left  point  pi  of  the  set  of  vertex  candidates, 
which  is  certainly  a  convex  hull  vertex,  but  not  necessarily  a  vertex  for  our  simple  polygon 
(unless  it  is  valid  locally).  For  example,  in  Fig.3,  pi  is  not  a  simple  vertex  because  p5plp2  is 
not  a  valid  triangle  cell  (valid  cells  are  shaded  areas  which  show  valid  data  points).  Since 
p6p2p3  is  a  valid  triangle  cell,  we  start  our  algorithm  from  p2. 


pl3  i  pl4  ]  pl5  = 


p5‘ 


. -* . 

pid  pii  , 

pi  6 

. 

pl2 

L— ... 

pH 

P/ 

pS 

M . 4 

L . i . 

1  pi  ;  p2 

j  p3 

p4 

Figure  3  Example  of  modified  Jarvis’  march  and  cell  decomposition.  Shaded  area  represents  valid  data 
points. 
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A  data  structure  is  defined  for  each  intersection  point  P  as  follows 

typedef { 

intersect-point  left,  right,  up,  down; 
intersect-point  previous,  next; 

}  intersect-point  P; 

Fig.4  shows  the  relationship  among  the  members  of  the  data  structure.  Assume  that  an  inter¬ 
section  point  is  intersected  by  only  two  supporting  lines. 


P->up  =  P4 


P->right  =  P3 


P->next  =  ? 


P->left  =  Pi  P->down  =  P2 

P->previous  =  P| 


Figure  4  Illustration  of  data  structure  of  intersection  point 


After  the  starting  vertex  is  found,  we  march  for  the  next  vertex  as  illustrated  in  Fig.4.  If 
there  are  sufficient  data  points  in  cell  PP2P3,  next  valid  vertex  is  P2;  If  P2  is  not  valid,  we 
check  if  P3  is  valid;  If  P3  is  also  invalid,  P4  must  be  valid,  or  an  error  will  occur.  The  march 
ends  when  the  next  vertex  is  the  starting  vertex.  The  modified  Jarvis’  march  (MJM)  algo¬ 
rithm  is  given  as  follows: 


Algorithm  MJM 

Step  l.  initialize  starting  vertex 

START->previous  =  NULL, 
P  =  START->next, 
P->previous  =  START; 
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Step  2.  march 

P->left  =  P|j  P->down  =  P2;  P->right  -  P3;  P->up  =  P4; 

If  cell  PP2P3  valid,  P->ncxt  =  P2  (case  I) 
else  If  cell  PP3P4  valid,  P->next  =  P3  (case  2) 
else  If  cell  PP4P1  valid,  P->next  =  P4  (case  3) 
else  error  occurs; 

Step  3.  terminate 
If  P->next  =  START. 

A  postprocessing  step  may  be  necessary  to  remove  points  which  belong  to  case  2  in  step  2  of 
the  march  algorithm.  These  points  are  on  the  same  line  with  its  previous  point  and  its  next 
point.  For  example,  in  Fig.3,  pl2  can  be  removed  because  p8  and  pl6  make  it  redundant. 

As  can  be  seen  from  the  above  algorithm  and  Fig.3  as  well  as  Fig.4,  the  problem  of  single 
polygon  reconstruction  from  supporting  lines  and  valid  data  points  is  one  of  cell  decomposi¬ 
tion.  As  we  march  around  all  supporting  lines,  the  Guibas  style  boolean  formula  of  the  sim¬ 
ple  polygon  can  be  readily  formulated. 


53  3D  spatial  connectivity 

So  far  we  have  discussed  the  problem  of  recovering  the  connectivity  of  supporting  lines  of  a 
simple  polygon.  The  approach  uses  information  at  both  signal  level  (real  data  points)  and 
algebraic  level  (line  equations).  The  same  hybrid  approach  can  be  applied  to  the  problem  of 
spatial  connectivity  of  planar  surfaces  in  3D. 

Indeed,  the  problem  of  connectivity  of  planar  surfaces  in  3D  can  be  reduced  to  a  set  of  prob¬ 
lems  of  connectivity  in  2D.  Assume  that  we  have  recovered  a  set  of  N  face  equations  and 
transformation  among  different  views  (e.g.,  from  PCAMD).  All  valid  data  points  from  mul¬ 
tiple  views  can  be  merged  in  the  same  world  coordinate  system.  For  each  face  F0  if  we 
intersect  all  other  N-l  faces  Fj(J  =  1, ...,  N-  1 ,  j  *  /).  with  F(  and  project  all  these  lines 
onto  F4,  we  get  M  ( =N- 1 )  supporting  lines  on  face  F,-.  We  also  project  nearby  3D  points  onto 
this  face  Ft.  Without  loss  of  generality,  we  assume  that  no  two  supporting  lines  are  parallel 
(or  a  normal  threshold  d  can  be  set  such  that  v-v,  >  d).  For  any  of  the  M  supporting  lines,  we 
intersect  it  with  the  rest  M-l  lines,  we  get  all  possible  candidates  for  vertices  of  the  valid 
simple  polygon  which  is  the  model  of  face  F,,  as  illustrated  in  Fig.5.  The  modified  Jarvis’ 
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march  algorithm  can  be  then  applied  to  each  of  the  N  faces  accordingly.  By  connecting  all 
polygons  recovered,  we  get  the  entire  3D  object  model  boundary.  A  simple  algorithm  can  be 
accordingly  constructed  to  establish  3D  spatial  connectivity  then. 

Fig.  S  shows  an  example.  It  is  a  face  of  a  toy  house  model.  The  complete  house  model  is 
reconstructed  and  presented  in  next  section.  Fig. 5(a)  shows  intersections  of  supporting  lines 
and  nearby  data  points  projected  on  this  face,  while  Fig.5(b)  superimposes  a  reconstructed 
simple  polygon  model  of  this  face  on  Fig.5(a). 


(a)  (b) 


Figure  5  Reconstruction  of  connectivity.  The  tiny  dots  represent  projected  nearby  data  points. 

Intersections  of  supporting  lines  are  represented  by  black  circles.  Vertices  of  reconstructed 
simple  polygon  are  represented  by  small  squares. 
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6  Experiments 


In  this  section,  we  present  results  of  applying  our  algorithm  on  synthetic  data  and  on  real 
range  image  sequence  of  objects.  We  demonstrate  the  applicability  and  the  robustness  of  our 
approach  using  synthetic  data,  and  present  the  recovered  model  from  real  range  images. 

6.1  Synthetic  data 

Our  synthetic  data  set  consists  of  a  set  of  12  planes  as  in  the  case  of  the  dodecahedron  in  the 
last  section.  A  dodecahedron  with  4  different  views  is  shown  in  Fig.  1  in  section  2. 

6.1.1  Applicability 

In  this  section  we  study  the  applicability  of  the  proposed  approach.  In  order  to  recover  the 
shape  of  a  dodecahedron,  given  correspondence,  how  many  views  are  necessary? 

For  example,  we  pick  4  distinct  views  from  the  viewing  sphere  so  that  there  is  no  singular¬ 
ity.  Singularity  occurs  when  less  than  6  faces  are  visible.  For  example,  we  can  formulate  two 
measurement  matrices  for  surface  normals  and  planar  distances  as  follows: 
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In  order  to  solve  the  first  WLS  problem  uniquely  for  F  frames,  we  need 


(EQ  39) 


(EQ  40) 


18F>  3F+  3P 


since  we  have  18F  equations  and  3F  unknowns  for  rotation  matrices  and  3P  unknowns 
for  surface  normals.  For  the  second  problem,  we  have  6F  equations,  but  there  are  only  F 


unknown  translation  vectors  and  P  unknown  plane  distances  after  using  results  from  the 
first  problem.  Therefore,  the  necessary  condition  to  uniquely  solve  the  second  problem  is 

6F23F  +  P 

Since  P  is  12,  F>4  is  the  unique  solution  for  both  problems.  Again,  we  are  not  concerned 
with  the  normalization  problem  here. 

6.1.2  Robustness 

We  study  the  effectiveness  of  our  approach  when  data  is  corrupted  by  noise  and  mismatch¬ 
ing  occurs.  Our  synthetic  data  consists  of  a  set  of  12  surface  patches  randomly  distributed 
around  all  faces  of  a  dodecahedron.  Correspondence  is  assumed  to  be  known.  Only  the  first 
WLS  problem  is  studied  because  of  the  similarity  between  those  two  WT  S  problems.  The 
minimization  of  weighted  squares  distance  between  reconstructed  and  given  measurement 
matrices  leads  to  the  recovery  of  surface  equations  and  transformations. 

To  study  the  error  sensitivity  on  reconstruction  of  our  algorithm,  we  take  four  nonsingular 
views  of  the  dodecahedron  where  each  component  in  every  surface  normal  is  corrupted  by  a 
Gaussian  noise  of  zero-mean  and  variable  standard  deviation.  As  we  have  shown  in  the  pre¬ 
vious  section,  at  least  4  views  are  required  to  recover  the  dodecahedron  model.  Fig  6  shows 
that  our  algorithm  converges  in  a  few  steps.  The  cases  with  standard  deviation  a  of  0.05, 
0.1,  0.2,  and  0.5  are  studied.  Notice  that  the  case  with  standard  deviation  of  0.5  yields  very 
noisy  original  data.  As  we  take  more  views,  the  sum  of  weighted  squares  error  is  reduced. 
Fig.  7  plots  the  normalized  weighted  least  squares  error  for  4  views,  8  views,  12  views,  and 
16  views  respectively,  while  gaussian  noise  with  0.5  standard  deviation  is  present.  The 
squares  errors  are  normalized  because  the  number  of  observations  increases  as  more  views 
are  introduced. 

The  more  interesting  case  is  when  mismatching  occurs.  Obviously  if  the  face  appears  only 
once  in  the  whole  sequence,  then  its  reconstruction  depends  on  the  amount  of  noise.  When 
this  face  appears  in  more  and  more  views,  its  reconstruction  using  our  WLS  method  is  aver¬ 
aged  over  these  views.  Fig.  8  gives  the  reconstructed  errors  of  a  face  which  appeared  12 
times  in  16  views.  When  only  two  views  are  matched,  the  reconstructed  surface  normal  is 
deviated  from  its  normal  by  18.2  and  38.9  degrees  when  standard  deviation  a  is  0. 1  and  0.2. 
respectively.  When  more  views  are  added,  the  angle  between  the  reconstructed  surface  nor¬ 
mal  and  its  normal  decreases  to  around  10  and  20  degrees  respectively. 
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When  an  observed  surface  normal  is  wrong  in  one  particular  view,  the  conventional  sequen¬ 
tial  reconstruction  method  results  in  an  erroneous  recovered  surface  normal  and  transforma¬ 
tion.  The  errors  propagate  as  new  views  are  introduced  regardless  of  the  number  of  views  in 
which  this  surface  is  visible.  However,  our  WLS  approach  gives  appreciably  smaller  recon¬ 
struction  error  on  this  observed  surface  normal  by  distributing  the  errors  in  all  views.  In  any 
case,  in  general,  our  approach  cannot  be  worse  than  the  sequential  approach.  Fig.  9  com¬ 
pares  the  reconstructed  errors  of  sequential  method  and  WLS.  There  are  12  observations  of 
this  surface  normal  in  16  views  and  its  first  observation  is  off  by  an  angle  between  0"  and 
40°.  The  reconstructed  models  of  sequential  method  and  WLS  method  are  shown  in  Fig.  6 
along  with  the  original  model,  for  the  case  of  40°  angle  deviation  of  one  surface  normal  in 
the  first  view.  Fig.  10a  shows  a  badly-skewed  model,  which  is  the  worst  case  from  the 
sequential  method  since  the  error  was  introduced  in  the  first  frame.  Fig.  10b  shows  the 
reconstructed  model  by  the  WLS  method  while  the  original  dodecahedron  model  is  pre¬ 
sented  in  Fig.  10c. 


Sum  of  Weighted  Squares  Error 
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Figure  6  Effect  of  noise 
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Figure  8  Reconstructed  error  vs.  number  of  matched  faces 
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Figure  9  Comparison  between  sequential  reconstruction  and  WLS  method 


Figure  10  Recovered  and  original  dodecahedron  models  (a)  worst  case  of  sequential  method;  (b)  our 
WLS  method;  (c)  original  model 


6.2  Real  range  image  sequence 

We  have  applied  our  algorithm  to  a  sequence  of  range  images  of  a  polyhedral  object,  using 
the  planar  region  tracker  described  in  section  3.  Figures  11(a)  and  11(b)  show  the  whole 
sequence  of  12  views  and  their  corresponding  segmentation  results.  Segmentation  is  not 
perfect  in  several  views.  Figure  12  shows  the  result  of  our  system,  two  shaded  views  of 
recovered  object  model.  Figures  13  and  14  show  another  example  of  a  toy  house. 
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Figure  II  A  sequence  of  images  of  a  polyhedral  object  (a)  original  images;  (h)  after  segmentation 
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Figure  14  Four  views  of  texture  mapped  display  of  a  reconstructed  house  model 


7  Concluding  Remarks 


An  object  modeling  approach  using  multiple  range  images  has  been  described  in  this  paper. 
The  boundary  representation  object  model  is  recovered  and  integrated  from  different  views. 
One  significant  contribution  of  this  work  is  the  application  of  principal  component  analysis 
with  missing  data  to  object  modeling  from  a  sequence  of  views.  An  inherent  problem  in 
multiple  view  integration  is  that  the  information  observed  from  each  view  is  incomplete  and 
noisy.  Based  on  Wiberg’s  formulation,  we  have  generalized  principal  component  analysis 
with  missing  data  as  a  WLS  minimization  problem  and  presented  an  efficient  algorithm, 
PCAMD,  to  solve  it. 

With  zero  weights  assigned  to  the  unobservable  data,  merging  different  views  can  be  formu¬ 
lated  as  a  combination  of  two  WLS  minimization  problems.  By  applying  the  PCAMD  algo¬ 
rithm  to  both  problems,  we  get  a  straightforward  two-step  algorithm  in  which  the  first  step 
computes  surface  normals  and  rotation  matrices  by  employing  the  quaternion  representation 
of  the  rotation  matrix;  the  subsequent  step  recovers  translation  vectors  and  normal  distances 
to  the  origin.  Experiments  on  synthetic  data  and  real  range  images  indicate  that  our 
approach  converges  quickly  and  produces  good  models  even  in  the  presence  of  noise  and 
mismatching.  An  accurate  polyhedral  object  model  reconstructed  from  a  sequence  of  real 
range  images  is  presented.  A  complex  toy  house  model  is  also  reconstructed. 

When  motion  between  two  views  is  relatively  small,  we  can  track  different  segmented  sur¬ 
face  patches  by  making  use  of  surface  normals,  distances,  centroids,  and  adjacency  informa¬ 
tion.  An  adjacency  graph  is  built  for  each  view  and  modified  as  the  viewing  direction 
changes.  A  significant  advantage  of  surface  patch  tracking,  as  opposed  to  other  methods 
such  as  point  matching  and  line  segment  tracking,  is  that  surface  patches  can  be  more  reli¬ 
ably  extracted  and  tracked.  One  problem  in  our  current  system  is  that  the  composite  graph 
has  to  be  accurately  recovered  to  extract  vertices  and  edges.  This  can  be  done  as  shown  in 
section  4  and  section  5.  However,  we  believe  that  the  composite  graph  may  not  be  neces¬ 
sary.  One  possible  solution,  which  we  are  currently  working  on,  is  to  combine  half-space 
intersection  and  union  of  surface  equations  and  range  data  to  eliminate  the  use  of  the  com¬ 
posite  graph. 

Another  contribution  of  this  work  is  a  hybrid  approach  of  establishing  spatial  connectivity 
of  boundary  surfaces.  The  spatial  connectivity  of  surfaces,  and  in  particular,  the  supporting 
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lines  of  a  simple  polygon,  can  be  obtained  by  combining  algebraic  equations  of  surfaces  and 
data  points  merged  from  multiple  views  once  transformation  is  recovered. 

“One  general  principle  in  computer  vision  is,  if  surface  information  is  not  enough  to  deter¬ 
mine  each  surface  locally,  use  global  constraints  that  constrain  relative  configuration  of  the 
surfaces  so  that  the  total  degrees  of  freedom  decrease.”  [17].  The  object  modeling  technique 
presented  in  this  paper  is  an  example  of  this  principle,  where  the  algebraic  structure  of  sur¬ 
face  equations  from  multiple  views  is  used  as  the  global  constraint.  The  recovered  object 
model  is  statistically  optimal  because  it  is  most  consistent  with  all  of  the  views  in  the  sense 
of  weighted  least  squares.  By  observing  and  employing  different  forms  of  input  redundancy, 
our  approach  can  be  easily  extended  to  other  vision  problems  such  as  shape  and  motion 
from  a  sequence  of  intensity  images.  We  are  also  working  on  applying  our  techniques  to 
more  complicated  scene  modeling. 
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